Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition
نویسندگان
چکیده
The performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, gets degraded with environmental noises, channel variation, physical behavioral changes in speaker. types Speaker related feature play crucial role improving systems. Gammatone Frequency Cepstral Coefficient (GFCC) features has been widely used to develop robust conventional machine learning, it achieved better compared Mel (MFCC) noisy condition. Recently, deep learning models showed learning. Most previous learning-based Spectrogram similar inputs rather than a handcrafted like MFCC GFCC features. high ratio mismatch utterances. Similar Spectrogram, Cochleogram another important for models. Like features, represents utterances Equal Rectangular Band (ERB) scale which none studies have conducted analysis robustness recognition. In addition, only limited speech-based condition using this study, model at Signal Noise Ratio (SNR) level from −5 dB 20 dB. Experiments are VoxCeleb1 added dataset by basic 2DCNN, ResNet-50, VGG-16, ECAPA-TDNN TitaNet Models architectures. identification verification both evaluated. results show that
منابع مشابه
Robustness of phase based features for speaker recognition
This paper demonstrates the robustness of group-delay based features for speech processing. An analysis of group delay functions is presented which show that these features retain formant structure even in noise. Furthermore, a speaker verification task performed on the NIST 2003 database show lesser error rates, when compared with the traditional MFCC features. We also mention about using feat...
متن کاملNoise robust feature for automatic speech recognition based on mel-spectrogram gradient histogram
This paper proposes an alternative scheme for extracting speech features in an automatic speech recognition (ASR) system. If an ASR system is trained using a clean speech source, a noisy environment may cause a mismatch between the features from the recognition data and those from the training data. This mismatch deteriorates the recognition accuracy. Thus, unlike in existing speech features, a...
متن کاملLearning Binaural Spectrogram Features for Azimuthal Speaker Localization
Spatial localization of speech and other natural sounds with rich spectro-temporal structure is a computationally challenging task. It requires extraction of features which are informative about speaker’s position and yet invariant to sound level and spectral modulation present in the signal. This paper demonstrates that this can be achieved with Independent Component Analysis (ICA) applied to ...
متن کاملLearning binaural spectrogram features for azimuthal speaker localization
Spatial localization of speech and other natural sounds with rich spectro-temporal structure is a computationally challenging task. It requires extraction of features which are informative about speaker’s position and yet invariant to sound level and spectral modulation present in the signal. This paper demonstrates that this can be achieved with Independent Component Analysis (ICA) applied to ...
متن کاملModulation spectrogram features for improved speaker diarization
We propose the use of modulation spectrogram features in speaker diarization. These features carry longer term characteristics of the acoustic signals than the widely used MFCCs, thus providing potential improvement by using both features in combination. Using the state-of-the-art ICSI speaker diarization system, an improvement of 20.77% relative DER is obtained on the NIST Rich Transcription 2...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied sciences
سال: 2022
ISSN: ['2076-3417']
DOI: https://doi.org/10.3390/app13010569